Statistical Language Modeling Using Grammatical Information

نویسنده

  • Dennis Grinberg
چکیده

We propose to investigate the use of grammatical information to build improved statistical language models. Until recently, language models were primarily innuenced by local lexical constraints. Today, language models often utilize longer range lexical information to aid in their predictions. All of these language models ignore grammatical considerations other than those induced by the statistics of lexical constraints. We believe that properly incorporating additional grammatical structure will achieve improved language models. We will use link grammar as our grammatical base. Being highly lexical in nature, the link grammar formalism will allow us to integrate more traditional modeling schemes with grammatical ones. An eecient robust link grammar parser will assist in this undertaking. We will initially build nite state-based language models that will utilize relatively simple grammatical information, such as part-of-speech data, along with information sources used by other language models. Our models feature a new framework for probabilistic automata that makes use of hidden data to construct context-sensitive probabilities. The maximum entropy principle employed by these Gibbs-Markov models facilitates the easy integration of multiple information sources. We will also build language models that take greater advantage of link grammar by including more sophisticated grammatical considerations. These models will include both probabilistic automata as well as models more closely related to the link grammar formalism. Expected contributions of this work will be to demonstrate that grammatical information can be used to construct language models with low perplexity, and that such models can be used to reduce the error rates of speech recognition systems.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies

The success of long short-term memory (LSTM) neural networks in language processing is typically attributed to their ability to capture long-distance statistical regularities. Linguistic regularities are often sensitive to syntactic structure; can such dependencies be captured by LSTMs, which do not have explicit structural representations? We begin addressing this question using number agreeme...

متن کامل

The Impact of Input Enrichment in Long Text vs. Short Texts on Grammatical Accuracy in Writing Among Elementary Language Learners

This study was conducted to investigate the influence of teaching accurate grammar inwriting via enriched long text and short text for the elementary students atShokouhe_Farhang institute. The homogenized subjects were divided into two groups of 18and 17 participants. Using a writing exam as a pretest in order to check the students’knowledge in English past tense. The control group received the...

متن کامل

GIATI: A General Methodology for Finite-State Translation Using Alignments

Statistical techniques have experienced an increasing interest by the natural language research community in the last years and have proved powerful possibilities for extracting useful information from translation examples. Both statistical language modeling and statistical machine translation are well-established disciplines with solid basis and outstanding results. On the other hand, finite-s...

متن کامل

Computational Modeling of the Role of Discourse Information in Language Production and Acquisition

Title of dissertation: COMPUTATIONAL MODELING OF THE ROLE OF DISCOURSE INFORMATION IN LANGUAGE PRODUCTION AND ACQUISITION Naho Orita, Doctor of Philosophy, 2015 Dissertation directed by: Professor Naomi Feldman Department of Linguistics This dissertation explores the role of discourse information in language production and language acquisition. Discourse information plays an important role in v...

متن کامل

Explorations in Using Grammatical Dependencies for Contextual Phrase Translation Disambiguation

Recent research has shown the importance of using source context information to disambiguate source phrases in phrase-based Statistical Machine Translation. Although encouraging results have been obtained, those studies mostly focus on translating into a less inflected target language. In this article, we present an attempt at using source context information to translate from English into Fren...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995